As someone who's been working as a system administrator for a number ofyears, it's easy to take tools for granted that I've used for a long time andassume everyone has heard of them. Of course, new sysadmins get into thefield every day, and even seasoned sysadmins don't all use the same tools.With that in mind, I decided to write a few columns where I highlight somecommon-but-easy-to-overlook tools that make life as a sysadmin (and really,any Linux user) easier. I start the series with a classictroubleshooting tool: sar.
There's an old saying: "When the cat's away the mice willplay." The sameis true for servers. It's as if servers wait until you aren't logged in(and usually in the middle of REM sleep) before they have problems. Logscan go a long way to help you isolate problems that happened in the past on a machine, but if the problem is due to high load, logs often don't tell thefull story. In my March 2010 column "Linux Troubleshooting, Part I: HighLoad" (http://www.linuxjournal.com/article/10688), I discussed how to troubleshoot a system with high load using toolssuch as uptime and top. Those tools are great as long as the system stillhas high load when you are logged in, but if the system had high load whileyou were at lunch or asleep, you need some way to pull the same statisticstop gives you, only from the past. That is where sar comes in.
Enable sar Loggingsar is a classic Linux tool that is part of the sysstat package and shouldbe available in just about any major distribution with your regular packagemanager. Once installed, it will be enabled on a Red Hat-based system, buton a Debian-based system (like Ubuntu), you might have to edit/etc/default/sysstat, and make sure that ENABLED is set to true. On a RedHat-based system, sar will log seven days of statistics by default. If youwant to log more than that, you can edit /etc/sysconfig/sysstat and changethe HISTORY option.
Once sysstat is configured and enabled, it will collect statistics aboutyour system every ten minutes and store them in a logfile under either/var/log/sysstat or /var/log/sa via a cron job in /etc/cron.d/sysstat.There is also a daily cron job that will run right before midnight androtate out the day's statistics. By default, the logfiles will bedate-stamped with the current day of the month, so the logs will rotate automaticallyand overwrite the log from a month ago.
CPU StatisticsAfter your system has had some time to collect statistics, you can use thesar tool to retrieve them. When run with no other arguments, sar displaysthe current day's CPU statistics:
$ sar. . .07:05:01 PM CPU %user %nice %system %iowait %steal %idle. . .08:45:01 PM all4.620.00 1.82 0.440.0093.1208:55:01 PM all3.800.00 1.74 0.470.0093.9909:05:01 PM all5.850.00 2.01 0.660.0091.4809:15:01 PM all3.640.00 1.75 0.350.0094.26Average: all7.820.00 1.82 1.140.0089.21If you are familiar with the command-line tool top, the above CPUstatistics should look familiar, as they are the same as you would get inreal time from top. You can use these statistics just like you would withtop, only in this case, you are able to see the state of the system back intime, along with an overall average at the bottom of the statistics, so youcan get a sense of what is normal. Because I devoted an entire previous column tousing these statistics to troubleshoot high load, I won't rehash all of thathere, but essentially, sar provides you with all of the same statistics, justat ten-minute intervals in the past.
RAM Statisticssar also supports a large number of different options you can use to pullout other statistics. For instance, with the -r option, you can see RAMstatistics:
$ sar -r. . .07:05:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit. . .08:45:01 PM8812802652840 75.0635528410286368336664183.8708:55:01 PM8814122652708 75.0635587210290248337908183.8909:05:01 PM8791642654956 75.1235648010294288337040183.8709:15:01 PM8867242647396 74.9135696010295928332344183.77Average:8517872682333 75.9033861210818388341742183.98Just like with the CPU statistics, here I can see RAM statistics from thepast similar to what I could find in top.
Disk StatisticsBack in my load troubleshooting column, I referenced sysstat as the sourcefor a great disk I/O troubleshooting tool called iostat. Although thatprovidesreal-time disk I/O statistics, you also can pass sar the-b option toget disk I/O data from the past:
$ sar -b. . .07:05:01 PMtpsrtpswtpsbread/sbwrtn/s. . .08:45:01 PM2.030.331.70 9.90 31.3008:55:01 PM1.930.031.90 1.04 31.9509:05:01 PM2.710.022.69 0.69 48.6709:15:01 PM1.520.021.50 0.20 27.08Average: 5.923.422.50 77.41 49.97I figure these columns need a little explanation:
tps: transactions per second.
rtps: read transactions per second.
wtps: write transactions per second.
bread/s: blocks read per second.
bwrtn/s: blocks written per second.
sar can return a lot of other statistics beyond what I've mentioned, but ifyou want to see everything it has to offer, simply pass the-Aoption, which will return a complete dump of all the statistics it has forthe day (or just browse its man page).
Turn Back TimeSo by default, sar returns statistics for the current day, but often you'llwant to get information a few days in the past. This is especially usefulif you want to see whether today's numbers are normal by comparing them to daysin the past, or if you are troubleshooting a server that misbehaved overthe weekend. For instance, say you noticed a problem on a servertoday between 5PM and 5:30PM. First, use the-s and -e optionsto tell sar to display data only between the start(-s) and end (-e) timesyou specify:
$ sar -s 17:00:00 -e 17:30:00Linux 2.6.32-29-server (www.example.net) 02/06/2012_x86_64_(2 CPU)05:05:01 PM CPU %user %nice %system %iowait %steal %idle05:15:01 PM all4.390.001.830.390.0093.3905:25:01 PM all5.760.002.230.410.0091.60Average: all5.080.002.030.400.0092.50To compare that data with the same time period from a different day,just use the -f option and point sar to one of the logfiles under/var/log/sysstat or /var/log/sa that correspond to that day. For instance,to pull statistics from the first of the month:
$ sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01 Linux 2.6.32-29-server (www.example.net) 02/01/2012_x86_64_(2 CPU)05:05:01 PM CPU %user %nice %system %iowait %steal %idle05:15:01 PM all9.850.00 3.95 0.560.0085.6405:25:01 PM all5.320.00 1.81 0.440.0092.43Average: all7.590.00 2.88 0.500.0089.04You also can add all of the normal sar options when pulling from past logfiles, so you could run the same command and add the-r argument to get RAMstatistics:
$ sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01 -rLinux 2.6.32-29-server (www.example.net) 02/01/2012_x86_64_(2 CPU)05:05:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit05:15:01 PM7664522767668 78.3136196411176968343936184.0305:25:01 PM8137442720376 76.9736252411188088329568183.71Average:7900982744022 77.6436224411182528336752183.87As you can see, sar is a relatively simple but very useful troubleshootingtool. Although plenty of other programs exist that can pulltrending data from your servers and graph them (and I use them myself), saris great in that it doesn't require a network connection, so if your servergets so heavily loaded it doesn't respond over the network anymore, there'sstill a chance you could get valuable troubleshooting data with sar.
Toolbox image via Shutterstock.com.
Kyle Rankin is a Tech Editor and columnist at Linux Journal and the Chief Security Officer at Purism. He is the author of Linux Hardening in Hostile Networks, DevOps Troubleshooting, The Official Ubuntu Server Book, Knoppix Hacks, Knoppix Pocket Reference, Linux MultimediaHacks and Ubuntu Hacks, and also a contributor to a number of other O'Reilly books. Rankin speaks frequently on security and open-source software including atBsidesLV, O'Reilly Security Conference, OSCON, SCALE, CactusCon, Linux World Expo and Penguicon. You can follow him at @kylerankin.
Load Disqus comments